ABSTRACT
Infectious disease surveillance frequently lacks complete information on race and ethnicity, making it difficult to identify health inequities. Greater awareness of this issue has occurred due to the COVID-19 pandemic, during which inequities in cases, hospitalizations, and deaths were reported but with evidence of substantial missing demographic details. Although the problem of missing race and ethnicity data in COVID-19 cases has been well documented, neither its spatiotemporal variation nor its particular drivers have been characterized. Using individual-level data on confirmed COVID-19 cases in Massachusetts from March 2020 to February 2021, we show how missing race and ethnicity data: (1) varied over time, appearing to increase sharply during two different periods of rapid case growth; (2) differed substantially between towns, indicating a nonrandom distribution; and (3) was associated significantly with several individual- and town-level characteristics in a mixed-effects regression model, suggesting a combination of personal and infrastructural drivers of missing data that persisted despite state and federal data-collection mandates. We discuss how a variety of factors may contribute to persistent missing data but could potentially be mitigated in future contexts.
ABSTRACT
BACKGROUND: The COVID-19 pandemic has highlighted the need for targeted local interventions given substantial heterogeneity within cities and counties. Publicly available case data are typically aggregated to the city or county level to protect patient privacy, but more granular data are necessary to identify and act upon community-level risk factors that can change over time. METHODS: Individual COVID-19 case and mortality data from Massachusetts were geocoded to residential addresses and aggregated into two time periods: "Phase 1" (March-June 2020) and "Phase 2" (September 2020 to February 2021). Institutional cases associated with long-term care facilities, prisons, or homeless shelters were identified using address data and modeled separately. Census tract sociodemographic and occupational predictors were drawn from the 2015-2019 American Community Survey. We used mixed-effects negative binomial regression to estimate incidence rate ratios (IRRs), accounting for town-level spatial autocorrelation. RESULTS: Case incidence was elevated in census tracts with higher proportions of Black and Latinx residents, with larger associations in Phase 1 than Phase 2. Case incidence associated with proportion of essential workers was similarly elevated in both Phases. Mortality IRRs had differing patterns from case IRRs, decreasing less substantially between Phases for Black and Latinx populations and increasing between Phases for proportion of essential workers. Mortality models excluding institutional cases yielded stronger associations for age, race/ethnicity, and essential worker status. CONCLUSIONS: Geocoded home address data can allow for nuanced analyses of community disease patterns, identification of high-risk subgroups, and exclusion of institutional cases to comprehensively reflect community risk.